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Abstract: The accurate quantification of gene expression levels is crucial 
for transcriptome study. Microarray has been used as a main platform for 
simultaneously interrogating thousands of genes in the past decade, and 
recently RNA-Seq has emerged as a promising alternative. The gene ex- 
pression measurements obtained by microarray and RNA-Seq are however 
subject to various measurement errors. A third platform called qRT-PCR 
is acknowledged to provide more accurate quantification of gene expression 
levels than microarray and RNA-Seq, but it has limited throughput capac- 
ity. In this article, we propose to use a system of functional measurement 
error models to model gene expression measurements and calibrate the mi- 
croarray and RNA-Seq platforms with qRT-PCR. Based on the system, a 
two-step approach was developed to estimate the biases and error variance 
components of the three platforms and calculate calibrated estimates of 
gene expression levels. The estimated biases and variance components shed 
light on the relative strengths and weaknesses of the three platforms and 
the calibrated estimates provide a more accurate and consistent quantifi- 
cation of gene expression levels. Theoretical and simulation studies were 
conducted to establish the properties of those estimates. The system was 
applied to analyze two gene expression data from the Microarray Quality 
Control (MAQC) and Sequencing Quality Control (SEQC) projects. 

Keywords and phrases: Transcriptome Profiling, Gene Differential Ex- 
pression, Comparative Calibration, Functional and Structural Parameters. 

1. Introduction 



The transcriptome of a cell is the entire set of RNA molecules or transcripts 
produced by DNA transcription under certain biological or environmental con- 
ditions. Systematic profiling of the transcriptome can not only provide a dy- 
namic characterization of the cell's molecular constitution, but also shed light 
on gene functional annotation, regulatory mechanisms, and transcriptional net- 
works underlying various biological processes of the cell. Since the mid-1990s, 
DNA microarray has served as the leading experimental platform for transcrip- 
tome study (Schena ct al. [20]). Despite its huge success, microarray is known to 
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suffer from some limitations sucli as reliance on existing knowledge of transcript 
sequences, high level background noise and limited dynamic range of detection. 

Recently, a new experimental platform called RNA-Seq, which is based on 
Next Generation Sequencing (NGS) technologies, has emerged as a promis- 
ing alternative to microarray for transcriptome profiling. The massive parallel 
throughput capacity of RNA-Seq provides whole genome coverage as well as 
single nucleotide resolution. Therefore RNA-Seq overcomes major limitations of 
microarray (Wang, Gerstein and Snyder [24]). Data generated from RNA-Seq 
experiments still demonstrate excessive variability. A number of studies pub- 
lished in the literature found that RNA-Seq is subject to various types of bias 
and variation attributable to a variety of sources; see Schraga, Oren and Ast 
[21] for a comprehensive review. Therefore, similar to microarray data, RNA-Seq 
data need to be normalized for downstream transcriptome analysis. 

A number of methods have been proposed in the literature for normaliz- 
ing RNA-Seq data. The RPKM method proposed by Mortazavi et al. [16] is 
arguably the most widely used method, which summarizes RNA-Seq data into 
Reads Per Kilobase of exon model per Million mapped reads for measuring gene 
or transcript expression levels. The RPKM method is intuitive and easy to im- 
plement, however, it ignores biases and variations observed in RNA-Seq data. 
In order to improve upon the RPKM method, various statistical model-based 
methods have been proposed, including GPseq (Srivastava and Chen [23]), mseq 
(Li, Jiang and Wong [14]), and POME (Hu et al. [12]). Despite reported im- 
provements of the newly proposed methods over RPKM, the problem of how 
to statistically characterize and further normalize RNA-Seq data has not been 
settled with satisfaction and demands further investigation (Mak [15]). 

For gene expression quantification, a third platform called the quantitative 
real-time Reverse Transcription Polymerase Chain Reaction (qRT-PCR) can 
also be used. Acknowledged as the most reliable gene expression measurement 
technology, qRT-PCR is sensitive to low levels of gene expression, has a wide 
dynamic range of detection, and is highly reproducible (Bustin [6]). Nonetheless, 
qRT-PCR is not perfect and also subject to variations caused by various bio- 
logical, technical, and experimental factors involved in qRT-PCR experiments 
(Bustin and Nolan [7]). The major disadvantage of qRT-PCR is that it is not 
a massive parallel technology, and thus the number of genes or transcripts that 
can be measured in a single qRT-PCR experiment is limited. This disadvantage 
prevents qRT-PCR from being used for large scale transcriptome studies. Cur- 
rently, qRT-PCR is either used for detecting and quantifying a small number of 
specific transcript targets or as a gold standard for validating hits or findings 
from other high-throughput platforms such as microarray (Applied Biosystems 
[2]). These two uses of qRT-PCR have not fully realized the potential of the 
technology for genome-wide gene expression profiling. 

The strengths and weaknesses of qRT-PCR, RNA-Seq and microarray can be 
summarized as follows. In terms of throughput capacity, RNA-Seq is the high- 
est, microarray is the second, and qRT-PCR is the lowest; whereas, in terms 
of accuracy, the order from the highest to the lowest is qRT-PCR, RNA-Seq, 
and microarray. An ideal platform for transcriptome profiling should combine 
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the accuracy of qRT-PCR with the high-throughput capacity of RNA-Seq. Un- 
fortunately, such a platform is not currently available yet. A natural question 
is whether it is possible to use statistical methods to combine the strengths 
of the three platforms and generate gene expression measurements of a higher 
quality at the genome-wide scale. The answer turns out to be positive, and the 
methodology that can be applied is statistical calibration. 

Statistical calibration is typically used for the scenario where p instruments 
are available for measuring the same quantity. Among the instruments, one 
is more accurate but at the same time more expensive than the others. In 
order to make a less accurate instrument generate more reliable measurement 
results for general use, it needs to be calibrated by the most accurate instrument 
through statistical analysis. There are two different types of calibration, which 
are absolute calibration and comparative calibration. The former refers to the 
type of calibration performed when the most accurate instrument gives the 
true value of the targeted quantity, whereas the latter refers to the type of 
calibration performed when the most accurate instrument is also subject to 
measurement error. The literature on statistical calibration is primarily focused 
on absolute calibration, and a variety of statistical methods have been developed; 
see Osborne [17] for a comprehensive review. On the other hand, comparative 
calibration is more challenging than absolute calibration and is mainly discussed 
in the literature on measurement error models; see Fuller [10] and Cheng and 
Van Ness [8] for more thorough discussions. 

The platforms for measuring gene expression levels using qRT-PCR, microar- 
ray, and RNA-Seq represent three different instruments, with the qRT-PCR 
platform being the most accurate and the most expensive. In this article, we 
propose to use a system of measurement error (ME) models to model gene ex- 
pression measurements obtained by the qRT-PCR, microarray, and RNA-Seq 
platforms, and to further use the system to calibrate RNA-Seq and microarray 
measurements with qRT-PCR measurements. A two-step approach is used to 
estimate the parameters of the system of ME models, and the statistical prop- 
erties of the resulting estimates are discussed. Through both theoretical and 
simulation studies, we show that the calibrated gene expression measurements 
are more consistent and accurate than those by any of the individual platforms. 
Furthermore, we apply the system of ME models to calibrate gene expression 
data generated by qRT-PCR, RNA-Seq, and microarray from both the Microar- 
ray Quality Control (MAQC) and Sequencing Quality Control (SEQC) projects 
(detailed information about the data is given in Section 4.1), and show that the 
resulting calibrated measurements provide more accurate quantification of gene 
expression levels and have more power in gene differential expression analysis. 

The rest of the article is organized as follows. In Section 2, we define the 
system of measurement error (ME) models and describe the two-step approach 
for estimating the parameters in the system. In Section 3, we present some 
simulation results. In Section 4, we describe the gene expression data from the 
MAQC and SEQC projects and discuss the results from applying the system 
of ME models to analyze the data. Section 5 concludes the article with further 
discussion and possible future research. An Appendix is enclosed at the end of 
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this article. 

2. Measurement Error Models and Calibration 

Let T denote the cohection of ah the genes or transcripts in the transcriptome 
under study. As discussed in the Introduction, RNA-Seq has the capacity to 
measure the expression levels of all genes in T, microarray can target thousands 
of genes, and qRT-PCR will generally measure hundreds of genes at most in a 
single experiment. Let A, B, and C be the collections of genes measured by qRT- 
PCR, microarray, and RNA-Seq, respectively. We assume that A C B C C C T, 
and the cardinalities of A, B, and C are n, m, and respectively. By the as- 
sumption, n genes (i.e. those in A) are measured by all three platforms, m ~ n 
genes (i.e. those in i3 — ^) are measured by both microarray and RNA-Seq, and 
I — m genes (i.e. those in C — S) are measured only by RNA-Seq. We further 
assume that commonly used platform-specific methods are used to process and 
normalize raw data generated by each platform to produce gene expression mea- 
surement data. Some examples are the FLIER method (Affymctrix Inc. [1]) for 
microarray, the RPKM method for RNA-Seq, and the delta-delta Ct method for 
qRT-PCR. Following the convention in transcriptome studies, the log-2 trans- 
formation is further applied to the normalized gene expression measurements of 
the three platforms, and the resulting values are referred to as the qRT-PCR, 
microarray and RNA-Seq gene expression measurements, respectively, in the 
rest of this article. 

Each type of gene expression measurement data can be generated by one 
lab or multiple labs. Based on the number of labs involved in generating gene 
expression measurement data, we distinguish two scenarios, which are the single- 
lab scenario and the multi-lab scenario. In the single-lab scenario, each type of 
gene expression measurement data is generated by a single lab. Note that the 
labs in the single-lab scenario may be the same lab. In the multi-lab scenario at 
least one type of gene expression data is generated by more than one lab. The 
rationale for separating these two scenarios is that variability attributable to 
labs can be assessed under the multi-lab scenario but not under the single-lab 
scenario. 

2.1. Single-lab Scenario 

We assume each lab generates one expression measurement for each gene. When 
technical replicates are available within one lab, their average value is used as 
the expression measurement of a gene. Suppose that the genes in ^, B — A, 
and C — B are labeled by integers from 1 to n. n -I- 1 to to, and m -f 1 to Z, 
respectively. For 1 < j < n, let Xj denote the expression measurement of gene 
j by qRT-PCR. Similarly, let Yj denote the expression measurement of gene j 
by microarray for 1 < J < to, and Zj the expression measurement of gene j by 
RNA-Seq for 1 < j < I. Note that Xj, Yj and Zj are all in log-2 scales. 
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As discussed in the Introduction, gene expression measurements produced by 
the three platforms are ah subject to measurement errors. We propose to use the 
following measurement error models to characterize Xj , Yj and Zj , respectively, 



Note that the above models are not simple linear models. All terms in the 
right-hand side of the equations are not observed, and there are cross terms of 
unknown quantities. Next, we will discuss the terms in these three models in 
detail and propose the estimation method. 

In model (2.1a), iij is the true expression level of gene j, eij is the measure- 
ment error due to the qRT-PCR platform. Depending on what normalization 
method is used, qRT-PCR can lead to either absolute quantitation or relative 
quantitation of a gene's expression level (Pfaffl [18, Chapter 3]). Therefore, the 
interpretation of /ij depends on whether absolute or relative quantitation is used 
in experiment. For simplicity, in this article, we do not further distinguish these 
two qantitation methods and simply refer to fj.j as the true expression value of 
gene j. We assume that for 1 < j < n, ey's are i.i.d. iV(0, cr^). The variance of 
Xj is af, representing the reproducibility of the qRT-PCR platform. According 
to the model, the qRT-PCR measurement Xj is unbiased with respect to Hj. 

In model (2.1b), e2j is the measurement error due to the microarray plat- 
form. For 1 < j < m, we assume e2j's are i.i.d. iV(0, (t|). The other two terms 
form the mean part of the model, i.e., EiYjr) — a2 + /32Mi- other words, 
the mean of the microarray measurement of gene j is assumed to be a linear 
function of the true expression level /Xj, where a2 and /32 are the intercept and 
slope, respectively, with ^2 representing the scale of the microarray measure- 
ments relative to the qRT-PCR measurements. The variance of Yj is cr|. When 
comparing the reproducibilities of the qRT-PCR and microarray platforms, Yj 
needs to be transformed to Yj = (Yj — a2)//32, which is of the same scale as Xj. 
The variance of Yj, which is cr|//3|, is referred to as the reproducibility of the 
microarray platform. 

In model (2.1c), the mean of the RNA-Seq measurement is also assumed to 
be a linear function of ^j, i.e., E[Zj) = + Psl^j, and e^j is the measurement 
error due to the RNA-Seq platform. We assume that for 1 < j <l, es^'s are i.i.d. 
A^(0, (t|). The slope represents the scale of RNA-Seq measurements relative to 
the qRT-PCR measurements. The variance of Zj is (t|. Similar to the microarray 
platform, we refer to crl/ 0^ as the reproducibility of the RNA-Seq platform. 

Different measurement platforms generally have different dynamic ranges of 
detection. This is also the case for the qRT-PCR, microarray and RNA-Seq 
platforms. The models proposed above may only hold for a range that fits all 
three platforms. Therefore, when applying the models in practice, a proper 
range of expression levels needs to be used, and genes with extremely low or 
high expression levels need to be excluded. Measurement errors are typically 
heteroscedastic, that is, the variance of a measurement error depends on the 




lij + €ij (1 < j < n) , 

0L2 + /32Mi + <^2j (1 < j < m) 
"3 + ^33^J'j + esj (1 < j < I) ■ 



(2.1a) 
(2.1b) 
(2.1c) 
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magnitude of the targeted quantity. Notice that the errors in the models above 
are assumed to homoscedastic. The justification for assuming homoscedastic er- 
rors is two-fold. First, the gene expression measurements Xj, Yj, and Zj are 
the log-2 values of the original measurements produced by the three platforms, 
respectively, and the log-2 transformation is known to mitigate heteroscedas- 
ticity. Second, as previously discussed, the proposed models will be applied to 
genes with expression levels within a certain proper range, which further allevi- 
ates the concern of heteroscedasticity. Nonetheless, when applying the models, 
diagnostic analysis always needs to be performed. 

When fij's in models (2.1a)-(2.1c) are assumed to be unknown fixed quan- 
tities, the measurement error models are said to be functional; and when /^j's 
are assumed to be i.i.d. random variables, the models are said to be structural. 
In this article, ^j's are gene expression values and considered fixed. Therefore, 
the models defined above are functional. Following the convention in the liter- 
ature, we call the ^j's incidental parameters, and the other parameters (a^, Pi 
for 2 < I < 3, and af for 1 < ? < 3) structural parameters. 

In general, when p measurement platforms or instruments are used to measure 
the same quantity, it requires p measurement error models to characterize the 
measurement results. For ease of discussion, the p measurement error models 
are said to form a system of ME models of order p. When the measurement error 
models are structural, the system is said to be structural; and when the models 
are functional, the system is said to be functional. Hence, models (2.1a)-(2.1c) 
defined above form a functional system of order 3, and any two of them form a 
functional system of order 2. 

Parameter Estimation The statistical literature on measurement error mod- 
els is primarily focused on systems of order 2. Reiers0l [19] showed that struc- 
tural systems of order 2 are not identifiable under the normality assumption, 
unless additional information such as the ratio between the variances of the two 
measurement instruments is available. For structural systems of order higher 
than 2 (e.g. p = 3), Barnett [3] showed they become identifiable with no fur- 
ther information needed, and the maximum likelihood estimates (MLEs) of the 
structural parameters can be obtained. 

Parameter estimation for functional systems of ME models is more difficult 
than that for structural systems due to the presence of the incidental parame- 
ters, Hj^s. First of all, under the normality assumption on the involved measure- 
ment errors, the likelihood function of a functional system becomes unbounded 
(Kendall and Stuart [13]). Solari [22] further showed that the likelihood func- 
tion of a functional system of order 2 does not have a local maximum. We have 
found that this is also generally true for functional systems of order 3. There- 
fore, the maximum likelihood method fails to produce proper estimates for both 
structural and incidental parameters of a functional system of ME models. 

The relationship between structural and functional systems was discussed 
in the early literature on measurement error models. Much attention has been 
given to the connection between the identifiability of structural systems and the 
existence of proper estimates of the structural parameters in functional systems. 
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One major result states that when a structural system is identifiable, then the 
MLEs for the structural system are still proper estimates for the structural 
parameters of the corresponding functional system (Gleser [11]). 

Based on the discussion above, a two-step approach can be used to estimate 
the parameters of the functional system of order 3 defined above for the qRT- 
PCR, microarray and RNA-Seq measurements. In the first step, the genes with 
measurements from all three platforms (i.e. genes in A) are used to obtain the 
MLEs of the structural parameters under the assumption that these iij's are 
i.i.d. N{iJ,,a'^). In the second step, the estimates obtained in the first step are 
used in place of their corresponding structural parameters in the functional 
ME system, and then the generalized least squares method is used to obtain 
estimates of the incidental parameters (or gene expression levels). 

Structural Parameters We use the expression nieasurernents of the genes 
in A to estimate the structural parameters. Let X , Y,, and Z, be the averages 
of Xj , Yj , and Zj over all genes in A. The sample variances and covariances for 

l<j<n, {^"}l<J<nj ^^'^ {^j}i<j<n ^-I'C dcnOtcd aS Sxx, Syy, Szz, Sxy, Sxz, 

and Syz , respectively. Following the first step of the two-step approach discussed 
previously, the estimates of the structural parameters are calculated and given 
as follows. 

/32 = |^, /33 = |^, a2^Y-^2X., as^Z.-l32X., (2.2a) 
al = (Sxx - , al = (Syy - , al = (Szz - . (2.2b) 

V / \ ^xz J \ Oxy J 

Note that when deriving the above estimates, the non-negativity constraints on 
the variance estimates were not enforced due to two considerations. First, when 
analyzing real data, the constraints are usually satisfied automatically, and thus 
the benefit of using more complicated estimates is minimal; second, when a 
constraint is violated, it is usually an indication that the underlying variance 
is negligible and thus the negative estimate can be automatically converted to 
zero. 

For convenience, let 6 = (^a2, as, (32, f^s, erf ,<T2,<t^) be the vector of the 
structural parameters, and 6 the estimate of 9. The following proposition es- 
tablishes the asymptotic distribution of 9 under the functional system of ME 
models as n goes to oo. 

Proposition 1. Assume the functional system of ME models is true and the 
following limits exist, 

jl = lim —'S^Ui, and A= lim — (u,- — u)^ > 0. (2.3) 

^ )i^oo 71 ^ ri-i-oo n ^ 

Then, as n goes to oo, \fn {9 — 9^ — ^ X (0, Fg), where — ^ means converge in 

distribution, and Fg is the variance- covariance matrix with its explicit expression 
given in Appendix A. 2. 
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The proof of Proposition 1 is outlined in Appendix A.l. The asymptotic 
property of 6 is stated as n goes to oo. In practice, this property will hold ap- 
proximately when n is sufficiently large. As will be seen in the simulation study 
in Section 3, n > 150 is large enough for the approximation to be sufficiently 
accurate. The assumptions on the mean and variance of fij^s in Proposition 1 
appear to be reasonable, because ^j's are the true expression levels of genes in 
a given cell line or tissue sample. 

Incidental Parameters How to best estimate the incidental parameter /jj 
depends on whether gene j is in ^, S — ^, or C — B. We consider these three 
cases separately. For each case, we follow the second step of the two-step ap- 
proach discussed previously. First, we replace the structural parameters by their 
estimates, and then we apply the generalized least squares method to obtain the 
estimate of /ij. 

For each j e A, given the structural parameters, the measurement error 
models involving fij can be rewritten as a linear model of iij (Appendix B). 
Replacing the structural parameters 9 by their estimates 0, and applying the 
generalized least squares method, the estimate for fij is 

^xyz ^ X,/af + /32 (y, - a2) lol + h {Z, - as) /^l ,^ 
llcrl+Pl/al+Pllal 

We call jl^^^ the calibrated estimate of for j £ A, because it incorporates all of 
the information in the three types of measurements available for fXj . The quality 
of jl^^^ depends on how accurate the estimates of the structural parameters are, 
which further depends n, the number of genes in A. The properties of fl^^^ as 
n goes to oo are given in the following proposition. 

Proposition 2. Assume the functional system of ME models is true, and the 
limits in (2.3) exist. For each j (z A, as n goes to oo, the calibrated estimate 
fi^y^ of fij asymptotically follows the normal distribution iV(^j, 7^), and the 
expectation and variance of (1^^ admit the following expansions: 

=l^j+0 {n-') , (2.5a) 
VarifcYl ^1A + n-'ojA (0) + O {n-') , (2.5b) 

where 7^ = (I/cti + $2/ '^2 + /^l/^l) '^'^'^ explicit expression of uj_a (9) is 
given in Appendix C. 

From Proposition 2, fL^^ is an asymptotically unbiased estimate of /Xj as 
n goes to 00. The variance of fi^^^ does not converge to zero asymptotically, 
instead, it converges to 7^. When n is small or moderate, the second term 
in the expansion of Var(/x^''^), i.e. n~^(u;^(0), may not be negligible. When 
n is sufficiently large, it becomes negligible, and the variance of fi^^ can be 
approximated by 7^. By replacing the structural parameters in 7^1 with their 
corresponding estimates, we obtain the estimated variance and standard error 
of Aj^^i which are 7^ and \/x4) respectively. 
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Note that the qRT-PCR measurement for is Xj, which is an unbiased 
estimate of /ij and its reproducibihty is af. Because 7^ is less than a^, fi^^ is 
more accurate than Xj when n is sufficiently large. Therefore, by combining the 
measurements from all three platforms, we can obtain a more accurate estimate 
of the expression level of gene j . 

Genes in B — A are measured by microarray and RNA-Seq but not by qRT- 
PCR. Therefore, only Yj and Zj are available for j G B — A. For each j € B — A, 
given the structural parameters, the measurement error models involving fij can 
be rewritten as a linear model for fij (Appendix B). Replacing the structural 
parameters 6 with their estimates and applying the generalized least squares 
method, the estimate of fij is 

■^yz ^ P2 {Yj - Aa) /al + {Zj - as) /ag ,^ 

We call fiY the calibrated estimate of ^j for j ^ B — A. The properties of 
as n goes to 00 are presented in the following proposition. 

Proposition 3. Assume the functional system of ME models holds, and the 
limits in (2.3) exist. For j Cz B — A, as n goes to 00, the calibrated estimate 
fiY of Hj asymptotically follows the normal distribution N {pLj^^^^j^) , and the 
mean and variance of ft^ admit the following expansions: 

E{fif'^)=^iJ+0{n-'), (2.7a) 
Var{fi^/'^) = ^B-A + n-^LOB-A {0) + O (n'^) , (2.7b) 

where jb~A = (/^l/ci + /^l/^'l) ^■'^'^ explicit expression of iUB~A{0) is 
given in Appendix C. 

From Proposition 3, is an asymptotically unbiased estimate of //j, and 
the asymptotic variance of is Jb-A- When n is small or moderate, the term 
n~^u!B-A {0) in the expansion of the variance of fiY' is not negligible. When 
n is sufficiently large, u^^ujb-A {0) becomes negligible, and thus the standard 
error of fcY can be approximated by \/jb-A, where jb-A is the estimate of 
jB-A- It is clear that jb-A is larger than 7^, which implies that the calibrated 
estimate of fj.j based on the measurements of RNA-Seq and microarray is less 
accurate than the calibrated estimate based on those by all three platforms. 
Further compare fiY with the microarray measurement Yj and the RNA-Seq 
measurement Zj. Because Jb-A is less than either of cr|//3| and (t|//3|, which 
are the reproducibilities of Yj and Zj, respectively, fiY i^ niore accurate than 
Yj and Zj. 

Genes in C — i3 are only measured by RNA-Seq but not by qRT-PCR and 
microarray. Therefore, only Zj are available for j G C — B. For each j G C — B, 
given the structural parameters, the measurement error models involving jij 
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can be written as a linear model for fij (Appendix B). Replacing the structural 
parameters with their estimates, the estimate of fXj is 

A| = [Z, - da) //33. (2.8) 

We call /i| the calibrated estimate of fij for j E C — B. The properties of fij as 
n goes to oo are presented in the following proposition. 

Proposition 4. Assume the functional system of ME models holds, and the 
limits in (2.3) exist. For j E C ^ B, as n goes to oo, the calibrated estimates 
flj of fij asymptotically follows the normal distribution N (fj,j,jc-B), o.'n-d the 
expectation and variance of fi^ admit the following expansions 

E{fi'^) = li,+0{n-^), (2.9a) 
Far(A|) = ic-B + n-^uJc-B (0) + O {n-'') , (2.9b) 

where Jc-B — (/^l/'^i) ^; '"^'^ explicit expression of ujc-b (f^) is given in 
Appendix C. 

From Proposition 4, /i| is an asymptotically unbiased estimate of jij. When 
n is small or moderate, the term n~^uJc-B{(^) in the expansion of the variance is 
not negligible. When n is sufficiently large, n^^uJc-B{d) becomes negligible, and 
the variance of /i| can be approximated by jc-B- The approximate standard 

error of /t| is \J^c-B^ where 7c-e is the estimate of ^c-B- It is clear that 
7C-B is larger than ^b-A and 7^, which implies that the calibrated estimate /i| 
based on the RNA-Seq measurement alone is less accurate than ffF^ and /ij*'^. 
Because 7c-e is equal to (t|//3|. which is the reproducibility of Yj , /i| has the 
same reproducibility as the RNA-Seq measurement Yj . The reason for using /i| 
is that it is in the same scale as Aj^^ and fvi^ so that the calibrated estimates 
in C — S are comparable with those in A and B — A. 

The calibrated estimates {/i^^^ : j e A), {fi^' : 3 E B - A^ and : j e 
C — i3} can also be considered gene expression measurements normalized by the 
qRT-PCR platform. As discussed above, the standard errors of these three types 
of calibrated estimates are different, which are VxA: \/lB-Ai and \/^c-B for 
genes in A, B—A and C — ;S, respectively. When used for gene expression analysis 
such as detecting differentially expressed genes, both the calibrated estimates 
and their standard errors need to be used. 

2.2. Multi-lab Scenario 

Under the multi-lab scenario, it is possible to decompose the measurement error 
due to a platform into the measurement error due to the technology of the 
platform and the measurement error due to lab. For example, the measurement 
error due to the qRT-PCR platform can be decomposed into the measurement 
error due to qRT-PCR and the measurement error due to lab. Denote the qRT- 
PCR measurement data by Xjk where j and k are indices for genes and labs, 
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respectively, with 1 < fc < fci. Similarly, denote the niicroarray measurement 
data by Yjk with 1 < k < k2 and the RNA-Seq measurement data by Zjk 
with 1 < fc < fcs- The involved labs are considered to be nested within the 
platforms. For convenience, in the rest of this article, we refer to the qRT- 
PCR, microarray and RNA-Seq platforms as the 1st, 2nd and 3rd platforms, 
respectively. The following system of ME models can be used to model the gene 
expression measurements obtained by the three platforms from multiple labs, 

Xjk = + f^ij + TTijfe (1 < j < 1 < fc < fci) , (2.10a) 

Yjk = 02 + P2t^j + K2j + 7V2jk < j < m; 1 < fc < fc2) , (2.10b) 

Zjk = as + P^fJ-j + K3j + TT'ijk {l<j<l; 1 < fc < fc3) , (2.10c) 

where Kij is the measurement error due to the technology of the ith platform 
for 1 < « < 3, and nikj is the measurement error due to the fcth lab of the ith 
platform. We further assume that the measurement errors are independent with 
each other, Kij's are i.i.d. N{0, i;f), and iTijkS are i.i.d. N (O, ipf) . Note that when 
fci = ^2 = fcs = 1, the system of ME models (2.10) under the multi-lab scenario 
reduces to the system of ME models (2.1) under the single-lab scenario, and 
Cij" = fiij + TTyi for 1 < z < 3. When fci, ^2, fcs > 2, the structural and incidental 
parameters in the system of ME models (2.10) can be estimated in a similar way 
as those in the system of ME models (2.1). Due to limited space, these results 
are not reported in this article, instead, they are included in Appendix D. 

The real data set wc will analyze in Section 4 contains qRT-PCR data from 
one lab, RNA-Seq data from two labs, and microarray data from five labs. In 
what follows, we present the estimation results for the system of ME models 
(2.10) with fci = 1, fc2 > 2 and fca > 2. When fci = 1, nij and niji become 
inseparable, and consequently, the two variance components and cannot 
be estimated separately, instead, only their sum = (^f + ipl is estimable. 

Since fci = 1, we suppress the index fc in Xj}^ and define X the same way as 
under the single-lab scenario. Let Yj be the mean of Yj^ over the fc2 labs, and Y 
is the grand mean across all of the genes in A. For the RNA-Seq data, , and 
Z_ are defined in similar ways as K,. , and Y of the microarray data. Denote the 
sample variances and covariances of {^j-) i<j<n ^^"^ i^i } i<j<n 

Sxx, Syy, Szz, Sxy, Sxz Siiid Syz- Thc two-stcp approach used for parameter esti- 
mation under the single-lab scenario can still be used under the multi-lab sce- 
nario. First, we derive the estimates of the structural parameters. The estimates 

of the linear coefficients ai and (3^ for z = 2, 3 are P2 = Syz/ Sxz, = Syz/Sxy, 

a2 — Y, — $2^,, and = Z. — PsX.. The estimates of the other structural 
parameters are given as follows. 

'S..-^], (.lla) 



^2 



-2 _ i a SxySyz \ 1^2 -2 _ ( Q SyzSx:: \ V'l 



Let (j) be the vector of the structural parameters, that is, cj) = (a2, as, /32 , /^a , cr^ , '01 , ■01 , , <j|) 
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, and <p the vector of the estimated structural parameters. The foUowing propo- 
sition estabhshes the asymptotic properties of </> under the functional system of 
ME models with multiple labs as n goes to oo. 

Proposition 5. Assume the system of ME models (2.10) with ki = 1, ^2 > 
2 and > 2 holds, and the limits in (2.3) exist. Then, as n goes to oo, 

■i/n (^ij) — (pj ^ N{0,Acf,), where ^ means converge in distribution, and 
is the variance- covariance matrix with its explicit expression given in Appendix 
E. 

Once the estimates of the structural parameters are obtained, we can plug 
them back into the system of ME models so that the incidental parameters 
(i.e. /ij 's) can be estimated by the generalized least squares method. Again, the 
resulting estimate of depends on whether j is in A, B — A, or C — B as under 

the single-lab scenario. Define Ai ^ erf, and Ai = + tpf /ki for i = 2,3. Let A.; 
be the estimate of Ai for 1 < * < 3. The estimates of the incidental parameters 
are given as follows. 

X./Ai + in - aa) /A2 + {Zj. - as) /Xz 

w,, = 7^—. — for J £ A, (2.12a) 

1/Ai + /3|/A2 -f /3|/A3 

,orjeB-A, (2.12b) 

/32VA2+/?|/A3 

= ^"'^"^ forjeC-S. (2.12c) 

The variances of the three types of estimates given above have the same forms 
as their counterparts under the single-lab scenario. The only difference is that 
af under the single-lab scenario should be replaced by Ai under the multi- 
lab scenario. Based on their asymptotic variances, the standard errors of the 
estimates can also be obtained. The estimates {fij^^ : j & A}, {f^Y : j ^ B — A} 
and {//| : j € C — B}, together with their standard errors, can be used in 
downstream gene expression analysis. 



3. Simulation 

An extensive simulation study was conducted to evaluate the performance of 
the proposed calibration methods. In this section, we report the simulation 
study results for three settings of the system of AIE models under the single-lab 
scenario. Simulation study results under the multi-lab scenario are similar to 
those under the single-lab scenario, and thus are not reported in this article. 

The structural parameters of the three settings are listed in Table 1. The 
incidental parameters (i.e. /Xj's) for each model setting were randomly drawn 
from A^(0,5). For each model setting, we independently generated a training 
data set of 300 genes and a testing data set of 1000 genes. Both data sets contain 
the measurements of the genes by all three platforms. The measurements of the 
first n genes in the training set were used to estimate the structural parameters 
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(n was varied from 20 to 300), and then the three types of cahbrated estimates 
of the expression levels of the genes in the testing set were obtained. For each 
combination of model setting and n, this procedure was repeated 200 times to 
calculate the MSEs of the estimates for the purpose of performance evaluation 
and comparison. 



Table 1 

The Settings of Model Parameters in Simulation Study 





Intercept 


slope 


Reproducibility 








/33 




^2 ^3 


Setting 1 


-0.5 0.1 


1 


1 


0.6 


1.5 0.8 


Setting 2 


0.02 0.2 


0.9 


0.95 


0.5 


1 0.75 


Setting 3 


-0.2 0.01 


1.3 


1.2 


0.6 


1 1.2 



We first examine and compare the variances of the calibrated expression levels 
, jj-Y^ fij. Recall that Var(/i^^^) = 7^+ n~-^uj^{6) + 0{n~^). When n is 
large, 7^ is the dominating term, and the other higher order terms are negligible; 
but when n is moderate (e.g. n £ [20,50]), the second term n~^a;^(0) is not 
negligible. Further notice that 7,4 depends on the structural parameters only, 
but n~^u!^{6) also depends on the incidental parameters, particularly (/ij — p,)^ 
as shown in Appendix C. This implies that the variance of jij^^ increases as fij 
deviates away from the mean fl when n is moderate. The variances of [1^ 
fij behave in the same way as the variance of A*^^^- 




Fig 1: Variances of calibrated estimates of gene expression levels 
Plots (a), (b) and (c) are for | yar(/i|''^)|, |yar(/ij^)| and |yar(/i|)|, respectively. 



Figure 1 shows the plots of the sample variances of ft 



j" ,11^ , and jlj versus 
the incidental parameters fij for the second model setting and three different 
sizes of training data subsets (i.e. n = 20, 50, and 300). For convenience, we refer 
to the curve generated by plotting the variance of a type of estimate against the 
incidental parameter as the variance curve of the type of estimate. Plot (a) in 
the left panel includes the variance curves of jj,^^^ for n = 20, 50, and 300 from 
the top to the bottom, respectively. Plot (b) in the middle includes those of /t^^. 
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and Plot (c) in the right panel includes those of fij. In each plot, the variance 
curve for n = 20 is above that for n ~ 50, which is above that for n = 300, 
indicating that as n increases, the variance of the respective estimate decreases. 
Furthermore, in each plot, the variance curve for n = 20 demonstrates a strong 
quadratic pattern, indicating its dependence on (/ij — /i)^, the variance curve 
for n = 50 shows a much mitigated quadratic pattern, and the variance curve 
for n ~ 300 becomes flat. Comparing the variance curves across the plots, it is 
clear that under the same n, the variance curve of A^^^ is lower than that of 
/ij^, which is lower than that of indicating that in terms of accuracy, the 
order of the three types of estimates, from the best to the worst, is /tj^^, fi^ 
and jlj . These results confirm the properties of the variances of the three types 
of estimates discussed in Section 2. 

We further examine the average MSEs of the three types of calibrated esti- 
mates over all of the genes in the testing data set. For convenience in discussion, 
we refer to the scatter plot of the average MSE of a type of estimate over all of 
the genes in the testing data set versus the training data size n as the aMSE 
curve of the type of estimate. Note that Xj is also an unbiased estimate of Hj, we 
also compare the average MSEs of Xj with the three types of estimates. Figure 
2 demonstrates the aMSE curves of the three types of estimates {fij^^}, {/ij^}, 
and {/tj} as well as that of the original qRT-PCR measurements {Xj,}. Plots 
(a) , (b) and (c) in Figure 2 from the left panel to the right panel correspond to 
the three model settings given in Table 1, respectively. 




(a) (b) (c) 



Fig 2: Average MSEs of Original Measurements and Calibrated Estimates 
Plots (a), (b) and (c) are for model settings 1, 2 and 3, respectively. 

The aMSE curves of {Xj} is flat because they do not depend on n. Overall, 
the aMSE curves of {fi^^^}, {Aj^li and {p-j} decrease as n increases, and they 
become flat after n is greater than 100, indicating that the structural parameters 
are accurately estimated and the benefit of further increasing n in the training 
data set becomes negligible. 
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In all of the plots, the aMSE curves of {ft^^^} are always the lowest, indicating 
that {fJ-J^"} give the most accurate quantification of the gene expression levels. 
The aMSE curves of {fij^} ^^'^ always below the aMSE curves of indicat- 
ing that {fij^} is more accurate than However, the aMSE curve of {/t^^} 
is not always below that of {Xj}, indicating that {/ij^} are not necessarily more 
accurate than the qRT-PCR measurements. In Plot (a), when n is sufficiently 
large (e.g. >100), the aMSE curve of {A}^} nearly overlaps with that of {Xj}, 
indicating that under the first model setting and when the estimates of the 
structural parameters are sufficiently accurate, the calibrated expression levels 
are as accurate as the qRT-PCR measurements {Xj}. In Plot (b), the aMSE 
curve of {fij^} above that of {Xj}, indicating that for the second model set- 
ting, {fij^} successfully calibrated the RNA-Seq and microarray measurements 
towards the qRT-PCR measurements but did not outperform qRT-PCR. In Plot 
(c), the aMSE curve of {/i^^} is below that of {Xj}, indicating that {/xj^} are 
more accurate than {Xj} under the third modeling setting. The performance 
of {/xj^} relative to {Xj} depends on the model settings. In ah of the plots, 
the aMSE curves of is always above that of {Xj}, indicating that the cal- 
ibrated estimate based on RNA-Seq measurement alone cannot outperform the 
qRT-PCR measurement. 

As discussed in Section 2, the structural parameters are fundamentally differ- 
ent from the incidental parameters. As n increases, the estimates of the struc- 
tural parameters will converge to their respective targets as shown in Proposi- 
tion 1. The simulation results about the structural parameters can be found in 
Appendix F. 

4. Application and Results 
4.I. Data Sets 

We applied the system of ME models to analyze the qRT-PCR, RNA-Seq, and 
microarray gene expression data of two RNA samples, namely the human brain 
reference RNA sample (or in short, the Brain sample) and the human univer- 
sal reference RNA sample (or the UHR sample), generated by the Microarray 
Quality Control (MAQC) and Sequencing Quality Control (SEQC) projects. 

The qRT-PCR data were generated by one single lab using TaqMan(R) Gene 
Expression Assays and can be downloaded from Gene Expression Omnibus 
(GEO) with the series number GSE5350 (ftp://ftp.ncbi.nih.gov/pub/geo/ 
DATA/supplementary/series/GSE5350/). The data originally contain the qRT- 
PCR measurements of 1001 genes, among them 7 genes have multiple entries 
with distinct expression values but under the same RefSeq ID. To avoid ambi- 
guity, these genes were removed. Each gene has 4 technical replicates for each 
of the UHR and Brain samples. 

The RNA-Seq data include measurements generated from two RNA-Seq ex- 
periments conducted in two different labs using Illumina Genome Analyzer. The 
data from the first lab [5] can be downloaded from NCBI Sequence Read Archive 
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(SRA) (http : / /www .ncbi .nlm.nih.gov/sra/) under the accession number SRA010153, 
and the data from the second lab can be downloaded from the same website un- 
der the accession number SRA008403. For convenience, we will use the accession 
numbers to refer to these two data in the rest of the article. The RPKM value 
is calculated lane-by-lane for each gene. For each of the Brain and UHR sample, 
those genes that received no reads in at least one lane of the two RNA-Seq 
experiments were excluded. (See Appendix G for more information of the two 
RNA-Seq data.) 

The microarray data were generated by five different microarray experiments 
conducted in five different labs using Affymetrix U133 Plus2.0. For convenience, 
we label the labs as MAI to MAS in the rest of the article. The original 
probe level data can be downloaded from Gene Expression Omnibus (GEO) 
with the series number GSE5350 (ftp://ftp.ncbi.nih.gov/pub/geo/DATA/ 
suppleinentary/series/GSE5350/). Data from each lab have 5 replicates for 
both UHR and Brain samples. For each replicate in each lab, the average probe- 
level measurement of a gene is considered the gene's expression level intensity. 

We integrated the three different types of gene expression data as follows. 
First, the log-2 transformation was applied to all three types of gene expression 
data. Then for each gene and each lab, the mean measurement across technical 
replicates is used as the gene's expression value. Furthermore, we exclude genes 
with extremely low or extremely high expression levels (i.e. those with qRT-PCR 
expression measurements below —6 or above 4 in log-2 scale) in the remaining 
genes in A are used to estimate the structural parameters. For the Brain sample, 
there are 409 genes that have expression data from all three platforms (A), 6419 
genes that have only RNA-Seq and microarray measurements {B — A), and 5949 
genes that only have RNA-Seq measurements (C — B); for the UHR sample, there 
are 477 genes that have measurements by all three platforms {A), 7187 genes 
that have measurements only by RNA-Seq and microarray (B — A), and 6892 
genes that have only measurements by RNA-Seq {C — B). 

Two schemes were used to analyze the integrated data. First, we considered 
the single-lab scenario, in which each platform has data from one lab. In total 
there are ten possible combinations, and we applied the system of ME models 
(2.1a)-(2.1c) to each combination. The analysis results of individual combina- 
tions were similar, and we only report the results for the combination that in- 
cludes the RNA-Seq data from SRA010153 and the microarray data from MAI. 
Second, we applied the system of ME models for the multi-lab scenario to ana- 
lyze the entire data set for each RNA sample. Due to limited space, the results 
from the multi-lab scenario are presented and briefly discussed in Appendix I. 

4-2. Results under Single-Lab Scenario 

4-2.1. Diagnostics of Model Assumptions 

The system of ME models (2.1a)-(2.1c) imposes the normality and homoscedas- 
ticity assumptions on the measurement errors. We checked these assumptions 
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using genes in A. After the system of ME models was fitted, we calculated the 
residuals by eij = Xj — fij , e2j = Yj — 612 ~ /32Ai ^^^^ ~ ^3 ~ "^3 ~ PiP^j 
corresponding to the measurement errors due to the qRT-PCR, microarray and 
RNA-Seq platforms, respectively. To check the normality assumption, we gen- 
erated the QQ plots for the residuals, and did not detect significant violation of 
the assumption. To check the homoscedastieity assumption, we generated resid- 
ual plots and constructed approximate 95% confidence intervals of the Box-Cox 
transformation. Because the diagnostic results are similar for the two RNA sam- 
ples, we only present those from the UHR sample as an example. The residual 
plots corresponding to the qRT- PGR, microarray, and RNA-Seq platforms are 
presented in Figure 7 in the Appendix H. The plots do not demonstrate strong 
heteroscedastic patterns. The 95% confidence intervals of Box-Cox transfor- 
mation for the residuals from the three platforms are presented in Figure 8 
in Appendix H. The confidence intervals corresponding to the qRT-PCR and 
RNA-Seq platforms both contain 1, and 1 is on the boundary of the confidence 
interval corresponding to the microarray platform. These results together in- 
dicate that there does not exist significant violation of the homoscedastieity 
assumption imposed on the platform measurement errors. 

4-2.2. Structural Parameters 

The estimates of the structural parameters for the Brain and UHR samples 
are given in Table 2. The standard errors of the estimates are also reported in 
parentheses in the table. The estimates of the structural parameters can be used 
to compare the three platforms in terms of the quality of measurements they 
provide. 

From models (2.1a)-(2.1c), the qRT-PCR measurements are unbiased with 
respect to Hj's, and the intercepts a2 and represent the biases of microarray 
and RNA-Seq measurements relative to the qRT-PCR measurements. Larger 
absolute values of a2 and indicate larger biases. From Table 2, 6:2 are 8.8401 
and 9.1033 in the Brain and UHR samples, respectively; and 0.3 are 4.9405 and 
5.4249 in the Brain and UHR samples, respectively. In both samples, a2 > da, 
indicating that microarray measurements have larger biases than the RNA-Seq 
measurements. The slopes /32 and ^3 represent the scales of microarray and 
RNA-Seq measurements relative to qRT-PCR measurements. From Table 2, /32 
are 0.7754 and 0.7695 in the Brain and UHR samples, respectively, both of which 
are significantly less than 1, indicating that the microarray measurements are in 
a smaller scale compared to the qRT-PCR measurements. On the other hand, 
$3 are 1.0254 and 1.0009 in the Brain and UHR samples, respectively, both of 
which are not significantly different from 1, indicating that the RNA-Seq mea- 
surements are in the similar scale as the qRT-PCR measurements. The variances 
(7i, cr|//3| and erf//?! refiect the reproducibilities of the three platforms. Smaller 
value indicates higher reproducibility. From Table 2, in the Brain sample, the 
three values are 0.7945, 2.0635 and 1.0157; and in the UHR sample, the three 
values are 0.6685, 1.7638 and 0.9435. In both samples, al < af/Pj < (7|/^|, 
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indicating that qRT-PCR has the best reproducibihty, microarray has the worst 
reproducibihty, and RNA-Seq is shghtly worse than qRT-PCR but much better 
than microarray. 



Table 2 

Estimates of Structural Parameters Using the qRT-PCR Data, RNA-Seq Data from 
SRA010153, and microarray Data from MAI 





Intercept 


Slop 


e 




Variance 






a2 














Brain 


8.8401 
(0.0724) 


4.9405 
(0.0859) 


0.7754 
(0.0216) 


1.0254 
(0.0324) 


0.7945 
(0.1143) 


1.2407 
(0.1031) 


1.0679 
(0.1283) 


UHR 


9.1033 


5.4249 


0.7695 


1.0009 


0.6685 


1.0444 


0.9452 


(0.0605) 


(0.0692) 


(0.0212) 


(0.0146) 


(0.0947) 


(0.0836) 


(0.1040) 



4-2.3. Incidental Parameters 

After the estimates of the structural parameters were obtained, we used the 
formulas obtained in Section 2 to estimate the gene expression levels, and the 
standard errors of the calibrated gene expression values were also calculated. 
These calibrated gene expression values together with their standard errors are 
expected to have a higher quality than the original measurements by the three 
platforms and lead to better results in downstream gene expression analysis. 

We compare the calibrated gene expression levels with their original qRT- 
PCR, microarray, and RNA-Seq measurements, and use the genes in the Brain 
and UHR samples that have measurements by all three platforms as an illustra- 
tive example. Because these genes have measurements by all three platforms, all 
three types of calibrated estimates {Aj^}; ^-^d {A^^^l could be calculated. 

Based on the theoretical and simulation results in Sections 2 and 3, are 
the most accurate measurements. We plotted the original measurements {^j}, 
{Yj} and {Zj} as well as the calibrated measurements {Aj^} against {/ij^^}, and 
presented the plots in Figure 3 and Figure 4 for the Brain and UHR sample, re- 
spectively. Because {ftj} are the linear transformation of the original RNA-Seq 
measurements {Zj}, the plot of {A|} versus {fij"^} had the same appearance as 
the plot of {Zj} versus {fi^^^} and thus was not presented. In each plot, the cor- 
relation coefficient between the two plotted measurements was calculated and 
reported. 




Fig 3: Scatter Plots of the measurements by microarray, RNA-Scq, qRT-PCR and /i^^ versus 
fi^y^ on ^ in the Brain Sample 
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From Figure 3, in the Brain sample, in terms of the hnear correlation coeffi- 
cient with the best measurements {fi^^^}, microarray is the worst (0.8112) and 
qRT-PCR is the best (0.9403) among the three platforms. This is consistent with 
the findings reported in the literature that in terms of the quality of the gene 
expression data produced by the three platforms, qRT-PCR is most accurate, 
and microarray is least accurate. The calibrated expression levels {/i^^} have a 
higher correlation coefficient with {/i^*'^} (0.9551) than the qRT-PCR, microar- 
ray, and RNA-Seq measurements. In the UHR sample, in terms of the linear 
correlation coefficient with the best measurements {Aj^^}, again microarray is 
the worst (0.8553) and qRT-PCR is the best (0.9583) among the three platforms. 
The calibrated expression levels {Aj^} have a higher correlation coefficient with 
{fij^^} (0.9645) than qRT-PCR, microarray, and RNA-Scq measurements. 



4-2.4- Gene Differential Expression 



The goal of gene differential expression analysis is to identify genes that are 
differentially expressed across different biological or environmental conditions. 
It is of interest to find genes that have different expression levels in the Brain 
and UHR samples. To demonstrate that calibrated expression levels can lead to 
better gene differential expression analysis, we used the RNA-Seq measurements 
(Zj's) and the calibrated measurements (Aj's), separately, to perform gene dif- 
ferential expression analysis for the two RNA samples. For both the calibrated 
measurements and the RNA-Seq measurements, we used two methods, which 
are z-test and local FDR [9], to detect differentially expressed genes and com- 
pared their results. The results from these two methods are discussed separately 
below. 

For each RNA sample, the calibrated expression level of a gene was /t^*'^. 



ftj , or fij depending on whether the gene is in A, B — A, or C — B. In what 
follows, for convenience, we do not further distinguish the notation for these 
three types of calibrated estimates. Instead, we use /ij'^^^'' and ^(^''''™) iq 
denote the calibrated expression levels of gene j in the UHR and Brain samples, 
respectively, and use Lp^^^^^'^ and (^(■^'"°™) to denote the estimated variances of 

^(UHR) , -.{Brain) , 

\Xa and /i"- , respectively. 



Z. SUN et al. /Statistical Calibration of Gene Expression Data 



20 



When performing z-tcst based on the cahbrated measurements for gene j, the 

{UHR) (Brain) 



p- value tor testmg Hq : jjr- 



Pj =2P \ Z > 



. (Brain) . (UHR) 



-(Brain) , -(UHR) 
if) '+ip) 



(4.1) 



where Z foUows the standard normal distribution. Based on the calculated p- 
values, the standard Benjamini-Hochberg procedure ([4]) was used to identify 
differentially expressed genes with the false discovery rate (FDR) controlled 
at 0.01. The numbers of the identified genes are reported in Table 3. Simi- 
larly, based on the RNA-Seq measurements, we applied z-test together with the 
Benjamini-Hochberg procedure to detect differentially expressed genes between 
the UHR and Brain samples. The numbers of the identified differentially ex- 
pressed genes based on the RNA-Seq measurements arc reported in the second 
column of Tabic 3. 

From Table 3, in total, 407 genes were detected to be differentially expressed 
in the UHR and Brain samples using the calibrated expression measurements, 
whereas only 158 genes were detected using the original RNA-Seq measure- 
ments. These two groups of genes share 154 genes. Therefore, the calibrated 
expression measurements led to the detection of almost all the genes (154 out 
of 158 genes) detected by the original RNA-Seq measurements. In addition, the 
former detected 249 more genes than the latter. The total number of genes de- 
tected by both types of measurements was further broken down according to 
whether a gene belongs to ^, S — ^, or C — B, and the results are also re- 
ported in Table 3. From the table, 47, 142 and 60 more genes were detected by 
the calibrated expression measurements in A, B — A and C — B, respectively, 
than by the original RNA-Seq measurements. Therefore, at the same FDR, the 
calibrated measurements led to a higher power for gene differential expression 
analysis than the original RNA-Seq measurements when z-test was used. 

In examining the selected genes, the 253 genes that are selected by the cali- 
brated expression levels but not by the RNA-Seq measurements are of particular 
interest. The annotations of these genes suggest that half of them are related 
to brain functions or have been involved in past brain related studies. Among 
these genes, some are found to be brain specific genes such as FABP7, KCNA6, 
SNPH, CREG2 and ; some are found to be highly expressed in brain, for exam- 
ple TRIL, CYP26B1, RAPGEF5, BHDS and PPP2R5B; and some are found 
to be lowly expressed in brain, for example PAQR6. 
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Table 3 

Numbers of Differentially Expressed Genes Detected by Calibrated Estimates and 

RNA-Seq Measurements 







z-test 






Local FDR 




Gene Set 


Calibration 


RNA-Seq 


Overlap 


Calibration 


RNA-Seq 


Overlap 


A 


56 


9 


9 


13 


8 


8 


B-A 


202 


60 


60 


153 


164 


135 


C~B 


149 


89 


85 


220 


176 


176 


Total 


407 


158 


154 


386 


348 


319 



For both the calibrated and RNA-Seq measurements of the two RNA samples, 
we further used the local FDR method to identify differentially expressed genes 
with the false discovery rate controlled at 0.01. The numbers of the identified 
differentially expressed genes are reported in the fourth and fifth columns of 
Table 3. 

From Table 3, in total, 386 genes were detected to be differentially expressed 
in the UHR and Brain samples by the calibrated expression measurements, 
whereas 348 genes were detected by the original RNA-Seq measurements. These 
two groups of genes share 319 genes. The calibrated measurements led to the 
detection of 38 more genes than the RNA-Seq measurements. The total number 
of genes detected by both types of measurements was broken down according 
to whether a gene belongs to B — A, or C — S, and the results are also 
reported in Table 3. From the table, 5 and 44 more genes were detected by 
the calibrated expression measurements in A and C — B. lx\ B — A, the RNA- 
Seq measurements led to the detection of 11 more genes than the calibrated 
expression measurements. 

Examining the 67 genes that were detected by the calibrated measurements 
but not by the RNA-Seq measurements, gene annotation suggested that 50 of 
them are related to brain functions or have been involved in past brain related 
studies. Among the 29 genes that were detected by the RNA-Seq measurements 
but not by the calibrated measurements, 20 of them arc related to brain func- 
tions or have been involved in paste brain related studies. 

5. Discussion 

A system of functional measurement error (ME) models was proposed to cali- 
brate the microarray and RNA-Seq measurements of gene expression levels by 
qRT-PCR. Due to limited space, the design issue of the proposed approach was 
not discussed in this article. The success of the proposed approach hinges on 
the genes that are measured by all three platforms, and the major bottleneck 
is the relative low throughput of the qRT-PCR platform. Therefore, the design 
issue is centered on the qRT-PCR platform with respect to two questions. The 
first question is how many genes should to be measured by qRT-PCR, and the 
second question is which genes should be measured. For the first question, based 
on the theoretical, simulation, and real data application results in this article, it 
seems that at least 150 genes are needed to ensure that the bias and variances 
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(i.e the structural parameters) can be accurately estimated and the calibrated 
gene expression levels (i.e. the incidental parameters) can reach their best possi- 
ble accuracy. Vendors of the qRT-PCR platform such as Life Technologies now 
offers assays and services to measure a sufficiently large number of genes si- 
multaneously. This makes our proposed approach feasible in practice. For the 
second question, based on our theoretical results, the true expression levels of 
the genes selected to be measured by qRT-PCR should be as spread out as pos- 
sible. These two questions and the design issue in general will be addressed in 
a future publication. 

As discussed in the introduction, RNA-Scq data arc also found to be subject 
to excessive variability and various methods have been proposed to normalize 
RNA-Seq data. In this article, only the RPKM method and the resulting RPKM 
measurements were used as the RNA-Seq measurements. Clearly, other normal- 
ization methods and their resulting measurements can also be considered. The 
variance component due to the RNA-Seq platform estimated under the multi- 
lab scenario (i.e. <;|) in this article corresponds to the RPKM method. If another 
normalization method (e.g. the "mseq" method) is used, then will correspond 
to that normalization method. Therefore, the system of ME models can be used 
to compare different normalization methods. 

The functional system of ME models of order 3 can be extended to that of 
order p > 3, and the two-step approach to parameter estimation can also be 
extended in a straightforward manner. In this article, the bias of the measure- 
ment of a platform is assumed be a linear function of the true expression level. In 
general, nonlinear models or even nonparametric models can be considered. Fur- 
thermore, covariates can also be incorporated into the system of functional ME 
models. These possible extensions of the proposed approach will be investigated 
in the near future. 
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Appendix A: Proof of Proposition 1 

A.l. Asymptotic Consistency of 9 under Single-Lab Scenario 

Recall that A = lim„_i.oo ^ J2^=i (Mj ~ m)^- Note that Xj, Yj and Zj arc in- 
dependent normal random variables for all j. From classic asymptotic theory, 
sample moments of normal populations converge to their population counter- 
parts almost surely. Therefore, as n — >■ oo, we have 
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Based on the above convergence results, we have 
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A. 2. Asymptotic Variances of Estimates of the Structural 
Parameters under Single-Lab Scenario 

Let m = {X,,Y, Z,, Sxx, Syy, Szz, Sxy, Sxz, Syz)^ be the vector of sample mo- 
ments. Then is a function of m denoted as 9 (m). Proposition 1 can be derived 
by the delta method, and the covariance matrix of 6 is Tg = V J, where J 
is the first order derivative of ^ (m) w.r.t. m evaluated at E{m.), and V is 
the asymptotic covariance matrix of m. The elements of V can be obtained 
explicitly using the properties of moments of Normal distribution as described 
in Appendix l.B of Fuller [10]. Let rji = af for i = 1,2,3, and r]ij ~ rjirjj for 
i ^ j- The explicit form of Tq is described as follows. 
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Appendix B: Estimation of Incidental Parameters 

For each j G A, given the structural parameters, the measurement error models 
(la)-(lc) involving ^.j can be rewritten as a linear model of ^j, that is. 
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Replacing the structural parameters 6 with their estimates and applying the 
generalized least squares method, the estimate for fij is 



^xyz _ 
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For each j ^ B — A, given the structural parameters, the measurement error 
models involving can be rewritten as 



e2j 



Replacing the structural parameters B with their estimates and applying the 
generalized least squares method, the estimate of /ij is 

.y. ^ k - «2) lol + [Z, ~ as) l&l 

For each j ^ C — B, given the structural parameters, the measurement error 
models involving can be written as 

Zj - as = ^3A^j + esj - 

Replacing the structural parameters (aa, and ct|) with their estimates, the 
estimate of /ij is 
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Appendix D: Estimation under multi-lab scenario when all three 
platforms have data from multiple labs 

D.l. Estimation 

This section considers the estimates of parameters in the system of ME models 
under the multi-lab scenario with ki > 1 for i = 1, 2,3. Let Xj , Yj, and Zj, be 
the average measurements of gene j for the three platforms, and X.., Y, and 
Z.. the overall average measurements of all genes in A for the three platforms. 
Let Sxx, Syy, Szz, Sxy, Sxz, Syz bc the sample variances and covariances of 
{^i }i<j<„' {^i }i<j<„ ^'^'i {^j }i<j<„- Following the two-step approach, the 
estimates of the structural parameters are obtained as follows. 
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Let <f> be the vector of the structural parameters, that is, </> = (a2, as, /32, /33, 'A^, '/'I, "03: "^i 7 "^1 : ^1 )^ 
, and </) the vector of the estimated structural parameters. The following propo- 
sition establishes the asymptotic properties of (p under the functional system of 
ME models with multiple labs as n goes to 00. 

Proposition 6. Assume the system of ME models (2.10a)-(2.10c) with ki > 2 
holds for i ~ 1,2, 3, and the limits in Proposition 1 exist. Then, as n goes to 00, 
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■i/n (^cf) — ^ N (0, A^), where ^ means converge in distribution, and is 

the variance- covariance matrix with its explicit expression given in Section D.2. 

Once the estimates of the structural parameters are obtained, we can plug 
them back into the system of ME models so that the incidental parameters (i.e. 
tij's) can be estimated by the generalized least squares method. Similar as in 
Section 2, the beste estimate of the incidental parameter iij depends on whether 
j is in ^, B — A or C — B. Define = (if + i^f/ki for i = 1, 2, 3. The estimates 
of the incidental parameters are given as follows. 
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The variances of the three types of estimated incidental parameters given 
above have the same forms as their counterparts under the single-lab scenario. 
The only difference is that af under the singlc-lab scenario should be replaced 
by Xi under the multi-lab scenario. Based on their asymptotic variances, the 
standard errors of the estimates can be similarly obtained, which are denoted 
as y/jxyz, \flvz and VTz- The estimates {fi^^^ : j G A}, {j^Y ■ i G i3 — A\ and 
: j € C — B}, together with their standard errors, can be used in downstream 
gene expression analysis. 



D.2. Asymptotic Variances of the Estimates of Structural 

Parameters under multi-lab scenario when all three platforms 
have data from multiple labs 

Recall that A = lim„^oo ^ (Pj ~ m)^- Denote Xi ~ <rf -I- ipf /h for i = 

1,2,3. Denote Ay = XiXj for i,j = 1,2,3. The explicit form of is given as 
follows. 
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Appendix E: Asymptotic variances of the estimates of structural 
parameters under Multi-lab scenario when the 
qRT-PCR platform has data from one lab 

Recall that A = lim„_>oo ^ Z)i=i (/^j ~ f^f- Denote Ai = , Aj = + -^j^/fci 
for i = 2, 3. Denote Ay = A^Aj for i, j = 1, 2, 3. The explicit form of is given 
as follows. 
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Appendix F: Simulation Results of Structural Parameters under 
Single-Lab Scenario 

As discussed in Section 2, the structural parameters are fundamentally different 
from the incidental parameters, in that the former can be consistently estimated 
as the sample size n goes to infinity. In other words, as n increases, the estimates 
of the structural parameters will converge to their respective targets as shown 
in Proposition 1. Figure 5 shows the MSEs of (32 under the three model settings 
in Plot (a) and the MSEs of under the three model settings in Plot (b) 
as n increases. Figure 6 demonstrates the MSEs of af [i = 1,2,3) under the 
three model settings in Plots (a), (b), and (c), respectively. It is clear that, as 
n increases, all the MSEs decrease towards zero, indicating the convergence of 
the corresponding estimates. 
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Fig 5: MSB of Estimated Slopes 

Plot (a) shows the MSE of P2 and Plot (b) shows the MSB of as n increases. The 
solid, dashed and dotted lines correspond to model setting 1, 2, and 3 in simulation 

study. 



Fig 6: MSE of af 

Plot (a) shows the MSE of al, Plot (b) shows the MSE of al and Plot (c) shows the 
MSE of (t| as n increases. The solid, dashed and dotted Imes correspond to model 
setting 1, 2, and 3 in simulation study. 



Appendix G: RNA-Seq Experiments and Data Preprocessing 

The procedure of a RNA-Seq experiment is described as follows. First, the total 
RNA sample is purified to mRNA and further fragmented into small segments. 
These mRNA fragments are converted to cDNA by reverse transcription, and 
go through end repair, adapter ligation, size selection and amplification steps, 
resulting in a library of cDNA fragments. The cDNA fragments are then se- 
quenced by a sequencing machine, generating millions of reads (i.e. nucleotide 
sequence) of a certain length. Short reads are then mapped back to a reference 
genome, and are summarized as a sequence of counts. The counts at a nucleotide 
position is the number of reads that start at this position. Various normaliza- 
tion methods can be used to quantify gene expression level based on the counts 
data. In a RNA-Seq experiment using Illumina Genome Analyzer, multiple lanes 
(up to 7) can be used to sequence cDNA fragments and generate short reads, 
and the used lanes are treated as technical replicates. In this article, we used 
the average value of RPKM measurements (in log-2 scale) across the technical 
replicates (lanes) as the expression measurement for each gene. 
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For the RNA-Seq data used in this paper, SRA010153 contains short reads of 
35 base pairs from 7 lanes for both UHR and Brain samples, whereas SRA008403 
contains short reads of 31 base pairs from 7 lanes for the Brain sample but only 
from 3 lanes for the UHR sample. To avoid ambiguity caused by the unbalance in 
the number of lanes for the UHR sample, we use the first 3 lanes in SRA010153 
for the UHR sample. The short reads in both SRA010153 and SRA008403 were 
aligned to the UCSC hgl9 reference genome using Bowtie 0.12.7 (? ). Allowing 
up to two mismatches per read and counting only uniquely mapped reads, there 
resulted in about 2-4 million mapped reads per lane for SRA010153 and about 
0.7-0.8 million reads per lane for SRA008403. Using the Reference Sequence 
(RefSeq) database, the mapped reads were used to calculate the RPKM value 
for each gene. 

Appendix H: Residual Plots and Plots of Box-Cox Transformation 

The system of ME models (2.1a)-(2.1c) imposes the homoscedasticity and nor- 
mality assumption on the measurement errors for each platform. In order to 
check the homoscedasticity assumption, we generated residual plots and con- 
structed approximate 95% confidence intervals for Box-Cox transformations. 
The three plots in Figure 7 from left to right are the plots of residuals versus fij 
for qRT-PCR, microarray and RNA-Seq in the UHR sample, respectively. The 
three plots do not demonstrate strong heteroscedastic patterns. The three plots 
in Figure 8 from left to right present the 95% confidence interval for the Box-Cox 
transformation based on the residuals for qRT-PCR, microarray and RNA-Seq 
in the UHR sample. The 95% confidence intervals for Box-Cox transformations 
corresponding to the qRT-PCR and RNA-Seq platforms both contain 1, and 1 is 
on the boundary of the 95% confidence interval corresponding to the microarray 
platform. These results indicate that there does not exist significant violation 
of the homoscedasticity assumption on the platform measurement errors. 




(a) (b) (c) 

Fig 7: Plots of platform residuals for qRT-PCR (a), microarray (b) and RNA-Seq 
(c) based on the measurements in the UHR sample 
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Appendix I: Results under Multi-Lab Scenario 

In this section, we applied the system of ME models for the multi-lab scenario to 
analyze the entire expression data of the UHR and Brain samples. The benefit 
of using the entire data instead of a combination of the three types of data from 
single labs is two-fold. First, due to the inclusion of RNA-Seq and microarray 
data from multiple labs, we were able to decompose the variance component 
due to a platform into the variance component due to the technology of the 
platform and the variance component due to the involved labs, respectively, that 
is, af = ipi + for i = 2,3. Second, the estimates of the structural parameters 
and the calibrated measurements become more accurate with reduced standard 
errors. 

The estimates of structural parameters, together with their standard errors, 
are given in Table 4. We examine the estimated variance components of the 
RNA-Seq platform first. For the RNA-Seq platform, (f| = 0.8063 and V'f = 
0.1902 in the Brain sample; <;| 0.7422 and t/-^ = 0.0687 in the UHR sample. 
The variance components due to the RNA-Seq technology are about 5 and 10 
folds of the variance components due to the labs in the Brain and UHR samples, 
respectively. Therefore, the measurement errors in RNA-Seq measurements are 
mainly dominated by the measurement error due to the RNA-Seq technology. 
The sum of <f| and -01 can be used as an estimate of erf, which is the variance 
of the RNA-Seq platform, and we denote the resulting estimate as CTj. In the 
Brain sample, (t| = 0.9965, and in the UHR sample, = 0.8109, which are not 
significantly different from the values of o-| for the two RNA samples reported 
in Table 2 (Section 4.2.2 in the main article). 

For the microarray platform, <j| = 1.3190 and V'l = 0.0185 in the Brain 
sample, and if| = 1.0388 and ?/;| = 0.0193 in the UHR sample. Similar to the 
RNA-Seq measurements, the variance component due the microarray technology 
dominates the variance components due to the involved labs. The sum of the two 
variance components estimated under the multi-lab scenario is not significantly 
different from the variance component due to the combination of the microarray 
technology and lab estimated under the single-lab scenario, that is, V'l+'fl — 
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Table 4 

Estimates of Structural Parameters Using the Entire UHR and Brain Dataset 















-A 








as 






-I 


^1 








„ . 8.7822 
^''^''^ (0.0708) 


4.8970 
(0.0809) 


0.7416 
(0.0198) 


1.0001 
(0.0314) 


0.7135 
(0.1117) 


1.3190 
(0.1018) 


0.0185 
(0.0039) 


0.8063 
(0.1175) 


0.1902 
(0.0092) 


TTHR 9.1014 
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(0.0661) 


(0.0131) 


(0.0203) 


(0.0911) 


(0.0796) 


(0.0040) 


(0.0943) 


(0.0032) 



in both the Brain and UHR samples. 

After the estimates of the structural parameters were obtained, we used the 
three types of calibrated estimates of the incidental parameters to calibrate 
the gene expression levels and calculated their standard errors. The calibrated 
expression values based on the entire data are of a higher quality than those 
calculated in the single-lab scenario. These calibrated expression level measure- 
ments, together with their standard errors, can be obtained from the authors 
upon request. 



